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(54) Polymorphism detection 

(57) The present invention generally provides a rap- 
id efficient method for analyzing polymorphic or btallelic 
markers, and arrays for carrying out these analyses. In 
general, the methods of the present invention employ 
arrays of oligonucleotide probes that are complementa- 
ry to target nucleic acids which correspond to the nnarker 



sequences of an individual The probes are typically ar- 
ranged In detection blocks, each bkxk being capable of 
discriminating the three genotypes for a given nr^arker. 
e.g., the heterozygote or either of the two homozygotes. 
The method allows for rapid, automatable analysis of 
genetic linkage to even complex polygenic traits. 
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Oaacrlptlon 

BACKGROUND OF THE INVENTION 

5 The relationship between structure and (unction of niacromoleculos is of fundamental importance in the under- 

standing of biological systems These relationships are important to understanding, (or example, the functions of en- 
zymes, structural proteins and signalling proteins, ways in which cells communicate with each other, as well as mech- 
anisms of cellular control and metabolic feedback. 

Genetic information is critical in continuation of life processes. Life is substantially infornr^ationally based and its 

10 genetic content controls tho growth and reproduction of the organism and its complements. The amino acid soquoncos 
of polypeptides, which arc criticat features of all living systems, are encoded by tho genetic material of tho coll. Further, 
tho proportios of those polypeptides, e.g.. as onzymos. functional proteins, and structural protoins, are dotorminod by 
tho sequence of amino acids which mako them up. As structure and function are Intogralty rolatod, many biological 
functions may bo oxptainod by elucidating tho underlying structural foaturos which provide those functbns, and those 

IS structures are determined by the underlying genotic information in the form of polynuclootido sequences. Further, in 
addition to encoding polypeptides, polynuclootido sequences also can bo involved In control and regulation of gone 
expression. It therefore follows that the determination of the make-up of this genotic information has achieved significant 
scientific importance. 

As a specific example, diagnosis and treatment of a variety of disorders may often bo accomplished through idon- 
20 tification and/or manipulation of the genotic material which encodes (or specific disease associated traits. In order to 
accomplish this, however, one must first kjontify a correlation between a particular gene and a particular trait. This is 
generally accomplished by providing a genetic linkage map through which one identifies a sot of genetic markers that 
follow a particular trait. These markers can identify the location of tho gone encoding for that trait within the genome, 
eventually leading to the identification of the gene. Once the gene is identified, methods of treating the disorder that 
2S result from that gene, I.e.. as a result of overexpresslon, constitutive expression, mutation, underexpressk^n. etc.. can- 
bo more easily developed. 

One class of genetic markers includes variants in the genetic code termed 'polymorphisms.' In the course of 
evolution, the genome of a species can collect a number of vartatbns in tndtvkjual bases. These single base changes 
are termod single-base polymorphisms. Polymorphisms n^y also exist as stretches of repeating sequences that vary 

30 as to the length of the repeat from Individual to Indivkjual. Where these variations are recurring, e.g., exist in a significant 
percentage of a population, they can be readily used as markers linked to genes involved in mono- and polygenic traits. 
In tho human genome, single-base polymorphisms occur roughly once per 300 bp. Though many of theso variant 
bases appear too Infrequently among the allele population for use as genetic markers (i.e., S1 %). useful polymorphisms 
(e.g., those occurring in 20 to 50 % of the allele population) can be found approximately once per kilobase. Accordingly. 

3S in a human genome of approximately 3 Gb, one would expect to find approximately 3.000,000 of those 'useful' poly- 
morphisms. 

The use of polymorphisms as genetic linkage markers is thus of criticat importance in locating, identifying and 
characterizing the genes which are responsible for specific traits. In particular, such mapping techniques allow for the 
identification of genes responsible for a variety of disease or disorder-related traits which may be used in the diagnosis 
40 and or eventual treatment of those disorders. Given the size of the human genome, as well as those of other mammals, 
it would generally be desirable to provide methods of rapidly identifying and screening for polymorphic genetic markers. 
The present invention meets these and other needs. 

SUlVtMARY OF THE INVENTION 

4$ 

The present invention generally provides arrays and methods useful in screening large numbers of polymorphic 
markers in a genome. In particular, the present invention provides arrays of oligonucleotide probes for detecting a 
polymorphism in a target nucleic acid sequence. The array generally comprises at least one detection block of probes 
which includes first and second groups of probes that are complementary to first and second variants of the target 
so nucleic acid sequence, respectively The array further comprises third and fourth groups of probes having a sequence 
identical to the first and second groups, respectively but including all possible nrtonosubstitutions of positions in the 
sequence that are within n bases of a base corresponding to the polymorphism, where n is from 0 to 5. 

The present invention also provide methods of using the above arrays in screening genomic material tor tho pres- 
ence of the polymorphisms. 

55 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a schematic illustration of light-directed synthesis of oligonucleotide arrays. 
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Figure 2A shows a schematic representation of a single oligonucleotide array containing 78 separate detection 
blocks. Figure 2B shows a schernatic^illustratton of a detection block for a specific polymorphism denoted Wt-567. 
Figure 2B also shows the triplet layout of detection blocks for the polymorphism employing 20-mer oligonucleotide 
probes having substitutions 7, 1 0 and 1 3 bp from the 3' end of the probe. The probes present in the shaded portions 
5 of each detectbn block are shown adjacent to each detection block. 

Figure 3 illustrates a tiling strategy for a polymorphism denoted WI-567. and having the sequence 5'-TGCTGCCTT- 
GGTTC(A/GlAGCCCTCATCTCTTT-3'. A detection block specific for the WI-567 polymorphism is shown wrth the probe 
soquoncos tiled therein listed above. Predicted patterns for both homozygous forms and the heterozygous iorm are 
shown at the bottom. 

ro Figure 4 shows a schonnatic representation of a detection block specific for the polymorphism donoiod WM959 

having the sequence 5'-ACCAAAAATCAGTC(T/C|GGGTAACTGAGAGTG-3' with the polymorphism Indicated by the 
brackets. A (luoroscont scan of hybridization of the heterozygous and both homozygous forms are shown in the center, 
with the predicted hybridization pattern for each being indicated below. 

Figure 5 Illustrates an example of a computer system used to execute the software of the present Invention which 
IS determines whether polymorphic markers In ONA are heterozygote, homozygote with a first polymorphic marker or 
homozygote with a second polymorphic marker. 

Figure 6 shows a system bkx:k diagram of computer system 1 used to execute the software of the present Invention. 

Figure 7 shows a probe array including probes with base substitutions at base positions within two base posltbns 
of the polymorphic marker. The position of the polymorphic nnarker is denoted Pq and which may have one of two 
polymorphic markers x and y (where x and y are one of A. C. G, or T). 

Figure 6 shows a probe array including probes with base substitutions at base positions within two base positions 
of the polynr^rphic marker. 

Figure 9 shows a high level flowchart of analyzing intensities to determine whether polymorphic markers in ONA 
are heterozygote. homozygote with a first polyfDorphic marker or homozygote with a second polymorphic marker. 
2S Figure 1 0A shows a tiling arrangement of an array tiled for detecting 246 different polymorphic markers, both sense 

and antisensa strands. Each different polymorphism detection block is indicated by a number representing a specific, 
pre identified polynnorphism. Figure 10B shows a fluorescent scan of the array foltowing exposure to (iuorescently la- 
belled target sequence. 

30 DETAILED DESCRIPTION OF THE INVENTION 

I. General 

The present invention generally provides rapid and efficient methods for screening samples of genomic materia) 
3S for polymorphisms, and arrays specifically designed for carrying out these analyses. In partkiular, the present invention 
relates to the identificatk^ and screening of single base polymorphisms in a sample. In general, the methods of the 
present invention employ arrays of oligonucleotide probes that are complementary to target nucleic acid sequence 
segments from an individual (e.g., a hunr«n or other mammal) which target sequences include specific kJentified pol- 
ymorphisms, or "polymorphic markers." The probes are typically arranged in delectbn blocks, each block being capable 
of discriminating the three genotypes fqr a given marker, e.g., the heterozygote or either of the two homozygotes. The 
method allows for rapid, automatable analysis of genetic linkage to even complex polygenic traits. 

Oligonucleotide arrays typically comprise a plurality of different oligonucleotkje probes that are coupled to a surlace 
of a substrate in different known locations. These oligonucleotide arrays, also described as "Genechips"*,* have been 
generally described in the art. for example, U.S. Patent No. 5.143,854 and PCT patent publteatkjn Nos. WO 90/1 5070 
^5 and 92/1 0092. These arrays may generally be produced using mechanical synthesis methods or light directed synthesis 
methods which incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis 
methods. See Fodor et al.. Science, 251:767-777 (1991). Pirrung et at. U.S. Patent No. 5.143.854 (see also PCT 
Application No. WO 90/15070) and Fodor et al.. PCT Publication No. WO 92/10092 and U.S. Patent No. 5,424,186, 
each of whrch is hereby incorporated herein by reference. Techniques for the synthesis of these arrays using mechanical 
50 synthesis methods are described in, e.g., U.S. Patent No. 5,384,261. incorporated herein by reference in its entirety 
for all purposes. 

The basic strategy for light directed synthesis of oligonucleotides on a VLSIPS^" Array is outlined in Figure 1 . The 
surface of a substrate or solkj support, modified with photosensitive protecting groups (X) is illuminated through a 
photolithographk; mask, yielding reactive hydroxyl groups in the illuminated regions. A selected nucleotkJe, typically 
55 in the form of a 3'-0-phosphoramidite-activated deoxynucleoside (protected at the 5' hydroxyl with a photosensitive 
protecting group), is then presented to the surface and coupling occurs at the sites that were exposed to light. Following 
capping and oxidatton. the substrate is rinsed and the surlace is illuminated through a second mask, to expose addi- 
tional hydroxyl groups for coupling. A second selected nucleotide (e.g., 5'-protecled. 3*-0-pho8phoramidite-activated 
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deoxynucleoside) is presented to the surlace. The selective deprotection and coupling cycles are repeated until the 
desired set of products is obtained. Pease et al., Proc. Natl. Acad. Sci. (1994) 91:5022-5026. Since photolithography 
is used, the process can be readily miniaturized to generate high density arrays of oligonucleotide probes. Furthermore, 
the sequence of the oligonucleotides at each site is known. 

5 

H. Identification of Polynrtorphisms 

The methods and arrays of the present invention prtmarity find use in the identification of so<aliod 'useful* poly- 
morphisms (i.e.. those that are present in approximately 20% or more of the allele population). The present invention 

fo also relates to the detection or screening of specific variants of previously identified polymorphisms. 

A wide variety of methods can bo used to identify specific polyn^orphisms. For example, repeated sequencing of 
genomic material from largo numbers of individuals, although extremely time consuming, can bo used to identify such 
polymorphisms. Alternatively, ligation methods may bo used, where a probe having an overhang of defined sequence 
is ligatod to a target nucleotide sequence derived from a number of individuals. Differences in the ability of the probe 

IS to ligate to the target can reflect polymorphisms within the sequence. Similarly, restriction patterns generated from 
treating a target nucleic acid with a prescribed restriction enzyme or set of restriction enzymes can be used to Identify 
polymorphisms. Specifically, a polymorphism may result In the presence of a restriction site In one variant but not In 
another. This yields a difference tn restriction patterns for the two variants, and thereby Identifies a polymorphism. 
Oligonucleotide arrays may also be used to Identify polymorphisms. For example, as described in U.S. Patent Appli- 

20 cation Serial No. 08/465.606, filed June 7, 1995 polymorphisms may be Identified using type-tis endonucleases to 
capture and amplify ambiguous base sequences adjacent the restriction sites. The captured sequences are then char- 
acterized on oligonucleotide arrays. The patterns of these captured sequences are compared from various individuals, 
the differences being Indicative of potential polymorphisms. Alternative array-based methods may also be used to 
identify polymorphisms, including the methods described in U.S. Patent Application No. 08/629.031 . filed April 8. 1 996. 

2S Briefly, these methods hybridize a target nucleic acid against an appropriately tiled array e.g.. having probes comple-. 
mentary to step-wise segments of the target sequence. The ratio of hybridization intensity of perlectly matched probes 
to misnnatched probes Is plotted as a function of the position that is being interogated in the sequence, for each individual 
screened. Where a polymorphism is present, it yields a discrepency between the data plotted for the individuals, e.g.. 
a point of separation of the two or more individual's plots. 

30 In a preferred aspect, the identification of polymorphisms takes into account the assumption that a useful poly- 

morphism (i.e., one that occurs in 20 to 50% of the allele population) occurs approximately once per Ikb in a given 
genome. In particular, random sequences of a genome, e.g., random ikb sequences of the human genome such as 
expressed sequence tags or 'ESTs*. can be sequenced from a limited number of individuals. When a variant base is 
detected with sufficient frequency, it is designated a 'useful" polymorphism. In practice, the method generally analyzes 

3S the same 1 kb sequence from a small number of unrelated individuals. I.e.. from 3 to 5 individuals (6 to 10 alleles), 
where a variant sequence is identified, it is then compared to a separate pool of material from unrelated individuals. 
Where the variant sequence identified from the first set of individuals is detectable in the pool of the second set, it is 
assumed to exist at a sufficiently high frequency, e.g., at least about 20% of the allele populaton. thereby qualifying 
as a useful nr\arker for genetic linkage analysis. 

40 

HI. Screening Polymorphisms 

Screening pdymorphisms in samples of genomic material according to the methods of the present inventton, is 
generally carried out using arrays of oligonucleotide probes. These arrays may generally be tiled' for a large number 

4S of specific polymorphisms. By tiling' is generally meant the synthesis of a defined set of oligonucleotide probes whwh 
is made up of a sequence complementary to the target sequence of interest, as well as preselected variations of that 
sequence, e.g., substitution of one or more given positions with one or more members of the basis set of monomers, 
i.e. nucleotides. Tiling strategies are discussed in detail in Published PCT Application No. WO 95/11995, incorporated 
herein by reference in its entirely for all purposes. By 'target sequence' is meant a sequence which has been identified 

so as containing a polynnorphism, and more particularly, a single-base polynrorphism, also referred to as a 'biallelic base. 
' It will be understood that the term target sequence" is intended to encompass the various forms present in a particular 
sample of genomic material, i.e., both alleles in a diploid genome. 

In a partcular aspect, arrays are tiled for a number of specific, identified polymorphic marker sequences. In par- 
ticular, the array is tiled to include a number of detection blocks, each detectkwi block being specific for a specific 

ss polymorphic marker or set of polymorphic markers. For example, a detection bkjck may be tiled to include a number 
of probes whtoh span the sequence segment that includes a specific polyrryjrphism. To ensure probes that are com- 
plementary to each variant, the probes are synthesized in pairs differing at the biallelic base. 

In addition to the probes differing at the biallelic bases, monosubstiluted probes are also generally tiled within the 
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detection block. These monosubstiluted probes have bases at and up to a certain number of bases in either direction 
from the polymorphism, substituted with the remaining nucleotides (selected from A. T, G, C or U). Typically, the probes 
in a tiled detection block will include substitutions of the sequence positions up to and including those that are 5 bases 
away from the base that corresponds to the polymorphism. Preferably, bases up to end including those in positions 2 

5 bases from the polymorphism will be substituted. The monosubstituted probes provide interna) controls for the tiled 
array, to distinguish actuathybridizatk^n from anifaclual cross*hybridizatton. An example of this preferred 8ubstitutk)n 
pattern is shown in Figure 3. 

A variety of tiling configurations may also bo employed to ensure optimal discrimination of porlocity hybrkjizing 
probes. For example, a detection bkxk may bo tiled to provide probes having optimal hybridization intensities with 

10 minimal cross*hybridizat(on. For example, whoro a sequence downstream from a polymorphic base Is G-C rich, it coukj 
potentially give rise to a higher level of cross -hybridization or "noise." when analyzed. Accordingly, one can tllo the 
detection bkxk to take advantage of more of the upstream sequence. Such alternate tiling configurations arc sche- 
matically illustrated In Figure 2B, bottom, whoro the base In the probe that is complementary to the polymorphism is 
placed at different positions In the sequence of the probe relative to the 3' end of the probe. For ease of discussion, 

JS both the base which represents the polymorphism and the complementary base tn the probe are referred to herein as 
the "polymorphic base" or "polymorphic marker." 

Optimal tiling configurations may be determined for any particular polymorphism by comparative analysis. For 
example, triplet or larger detection blocks like those illustrated in Figure 28 may be readily employed to select such 
optimal tiling strategies. 

Arrays may be tiled for one or both strands of the target sequence, I.e.. the sequence including the polymorphism. 
The Inclusion of probes that hybridize to both the sense and antlsense strands, either on a single array or separate 
arrays, provides an additional level of verification for a given Interogation. Thus, in addition to probes that are comple- 
mentary to one sdtrand of a target sequence, a detection bbck will also include probes that are complementary to the 
antisense strand of the target sequence, and which are therefor complementary to the first group of probes. 
2S Additionally, arrays will generally be tiled to provide for ease of reading and analysis. For example, the probes tiled 

within a detection block will generally be arranged so that reading across a detection block, the probes are tiled in 
succession, i.e.. progressing along the target sequence one or more bases at a time (See, e.g.. Figure 3. middle). 

Once an array is appropriately tiled for a given polymorphism or set of polymorphisms, the target nucleic ackj is 
hybridized with the array and scanned. Hybridization and scanning are generally carried out by methods described in. 
e.g.. Published PCT Application Nos. WO 92/10092 and WO 95/11995, and U.S. Patent No. 5.424.1B6, previously 
incorporated herein by reference in their entirety for all purposes. In brief, a target nucleic aM sequence which includes 
one or more previously kdentified polymorphic markers is amplified by well known amplification techniques, e.g.. PGR. 
Typically this involves the use of primer sequences that are complementary to the two strands of the target sequence 
both upstream and downstream from the polymorphism. Asymmetric PGR techniques may also be used, i.e.. where 
3$ an array is tiled for only a sense or antisense strand. Amplified target, generally incorporating a label, is then hybridized 
with the array under appropriate conditions. Incorporation of a label generally involves incorporating a labeled nucle- 
otide into the amplification reaction, whereby the label is incorporated into the target nucleic acid. Typically useful labels 
include fluorescent labels coupled to nucleotides, as well as well known binding groups, e.g.. bkstin. streptavidin and 
the like, to which a labelled complement may be later bound. 

Upon completion of hybrklization and washing of the array the array is scanned to determine the position on the 
array to which the target sequence hybrWizes. The hyridizalton data obtained from the scan, i.e., in the form ot fluo- 
rescence intensities or some other detectable label or dye, is then plotted as a function of location on the array 

Although primarily described in temns of a single detection block, e.g., for detection of a single polymorphism, in 
preferred aspects, the arrays of the invention wilt include multiple detectk^n blocks, and thus be capable of analyzing 
multiple, specific polyn^orphisms. For example, preferred arrays will generally include from about 50 to about 4000 
different detection blocks with particularly preferred an-ays including from 100 to 3000 different detection blocks. 

In alternate arrangements, it will generally be understood that detection blocks may be grouped within a single 
array or in multiple, separate arrays so that varying, optimal conditions may be used during the hybridization of the 
target to the array For example, it may often be desirable to provide for the detection of those polymorphisms that fall 
so within G-C rich stretches of a genomic sequence, separately from those falling in A-T rich segments. This allows for 
the separate optimization of hybridization conditions for each situation. 

IV. Calling 

55 After hybridization and scanning, the hybridization data from the scanned array is then analyzed to identify which 

variant or variants of the polymorphic marker are present in the sample, or target sequence, as detemiined from the 
probes to which the target hybridized, e.g., one of the two homozygote fomns or the heterozygoie form. This determi- 
nation is termed •calling" the genotype. Calling the genotype is typically a matter ot comparing the hybrklization data 



5 



BP 0 785 280 A2 



for each potential variant, and based upon that comparison, identifying the actual variant (for homozygotes) or variants 
(lor heterozygotes) that are present. ln""one aspect, this comparison involves taking the ratio oi hybridization intensities 
(corrected for average background levels) for the expected perfectly hybridizing probes for a first variant versus that 
of the second variant. Where the marker is homozygous for the first variant, this ratio will be a large number, theoretically 

5 approaching an infinite value. Where homozygous for the second variant, the ratio wilt be a very low number, i.e., 
theoretically approaching zero. Where the marker is heterozygous, the ratio will be approximately 1 . These numbers 
are, as described, theoretical. Typically, the first ratio will be well in excess of 1, i.e., 2. 4. 5 or greater. Similarly, the 
second ratio will typically be substantially loss than 1, i.e.. 0.5, 0.2. 0.1 or loss. The ratio for heterozygotes will typically 
bo approximately equal to 1 , I.e. from 0.7 to 1 .5. Those ratk^s can vary based upon the specific sequence surrounding 

ro the polymorphism, and can also bo adjusted based upon a standard hybridization with a control sampto containing the 
variants of the polymorphism. The ratks may bo put on a linear scale bytaking the log,o of the ratio and multiplying the 
rosult by 10. This makes It easier to interpret the results of the comparison of the Intonsitios obsen/od. 

Tho quality of a given call for a particular genotype may also bo chocked. For exampio, the maximum perfect match 
intensity can bo divided by a measure ol tho background noise (which may bo represented by tho standard deviation 

IS of tho mismatched intensities). Where the ratio exceeds somo preselected cut-off point, the call is determined to bo 
good. For example, where the maximum Intensity of the expected perfect matches exceeds twice the noise level, It 
might be termed a good call. In an additional aspect, the present Invention provides software for perfomr^ing the above 
described comparisons. 

Fig. 5 illustrates an example of a computer system used to execute the software of the present invention which 

20 determines whether polymorphic markers in DNA are hetorozygoto. homozygote with a first variant of a polymorphism 
or homozygote with a second variant of a polymorphism. Fig. 5 shows a computer system 1 which includes a monitor 
3, screen 5. cabinet 7, keyboard 9. and mouse 11. Mouse 11 may have one or more buttons such as mouse buttons 
13. Cabinet 7 houses a CD-ROM drive 15 or a hard drive (not shown) which rr\ay be utilized to store and retrieve 
software programs incorporating the present invention, digital images for use with the present invention, and the like. 

25 Although a CD-ROM 17 is shown as the removable media, other removable tangible media including floppy disks, 
tape, and flash memory nrvay be utilized. Cabinet 7 also houses familiar computer components (not shown) such as a 
processor, memory, and the like. 

Fig. 6 shows a system block diagram of computer system 1 used to execute the software of the present invention. 
As in Fig. 5, computer system 1 includes monitor 3 and keyboard 9. Computer system 1 further includes subsystems 

30 such as a central processor 102, system memory 104, I/O controller 106. display adapter 108, removable disk 112, 
fixed disk 116. network interface 118, and speaker 120. Other computer systems suitable for use with the present 
invention may include additional or fewer subsystems. For example, another computer system could include more than 
one processor 102 (i.e., a multi-processor system) or a cache memory. 

Arrows such as 122 represent the system bus architecture of computer system 1. However, these arrows are 

3S illustrative of any interconnectkDn scheme sen/ing to link the subsystems. For example, a local bus could be utilized to 
connect the central processor to the system memory and display adapter Computer system 1 shown in Fig. 6 is but 
an example of a computer system suitable for use with the present invention. Other configuratrons of subsystems 
suitable for use with the present inventk>n will be readily apparent to one of ordinary skill in the art. 

Fig. 7 shows a probe array including probes with base substitutions at base posrtions within two base positions of 

^0 the polymorphic marker The position of the polymorphic marker is denoted Pq and which may have one of two variants 
of the polymorphic markers x and y (where X and y are one of A, C. G, or T). As indicated, at P.j there are two columns 
of four cells which contain a base substitution two base positions to the left, or 3', from the polymorphic marker The 
column denoted by an "x* contains polymorphic mariner x and the column denoted by a "y" contains polymorphic marker 
y. 

4S Similarly, P., contains probes with base substitulksns one base position to the left, or 3', of the polynrwrphic marker 

Pq contains probes with base substitutions at tho polymorphic marker position. Accordingly, the two columns in Pq are 
identical. P, and P2 contain base substitutions one and two base positions to the right, or 5', of the polymorphic marker 
respectively. 

As a hypothetical example, assume a single base polymorphism exists where one allele contains the subsequence 
so TCAAG whereas another allele contains the subsequence TCGAG. where the underiined base indicates the polymor- 
phism in each allele. Fig. 8 shows a probe array including probes with base substitutions at base positions within two 
base positions of the polymorphic marker In the first two columns, the cells which contain probes with base A {com- 
plementary to T in the alleles) two positions from the left of the polymorphte marker are shaded. They are shaded to 
indicate that it is expected that these cells would exhibit the highest hybridization to the labeled sample nucleic acid. 
55 Similariy. tho second two columns have cells shaded which have probes with base G (complementary to C in the 
alleles) one position to the left of the polymorphic marker 

At the polymorphic marker position (corresponding to Pq in Fig. 7). there are two columns: one denoted by an "A' 
and one denoted by a 'G'. Although, as indicated earlier the probes in these two columns are kjentical, the probes 
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contain base substitutions for the potymorphtc marker position. An 'N' indicates the cells that have probes which are 
expected to exhibit a strong hybridization if the allele contains a polymorphic marker A. As will become apparent in the 
following paragraphs. 'N' stands for numerator because the intensity of these ceils will be utilized in the numerator of 
an equatk>n. Thus, the labels were chosen to aid the reader's understanding of the present invention. 
5 A *D' indicates the cells that have probes which are expected to exhibit a strong hybridization if the allele contains 

a polymorphic marker Q. 'D* stands for denominator because the intensity of these cells will be utilized in the denom- 
inator of an equation. The 'n' and 'd' labeled celts indicate these cells contain probes with a single base mismatch 
near the polymorphic marker As before, the labels indicate where the Intensity of those cells will be utilized in a following 
equation. 

10 Fig. 9 shows a high level flowchart of analyzing intensities to dotormino whether polymorphic markers in ONA are 

hotorozygole. homozygoto with a first polymorphic marker or homozygoto with a second polymorphic marker. At stop 
202. the system rocoivos tho fluorescent Intonsitios of the colls on tho chip. Although in a proforrod ombodimont. tho 
hybridization of tho probos to tho sample aro dotermtnod from fluorescent intensities, other methods and labels includ- 
ing radioactive labels may bo utilized with tho present invention. An example of one ombodimont of a software program 
jS lor carrying out this analysis ia reprinted in Software Appendix A. 

A perfect match (PM) average for a polymorphic marker x is determined by averaging the intensity of the cells at 
Pq that have tho base substitution equal to x In Fig. 7. Thus, for tho example in Fig. 8, the perfect rriatch average for 
A would add the intensities of the cells denoted by *N' and divide the sum by 2. 

A mismatch (MM) average for a polymorphic marker x is determined by averaging the intensity of tho cells that 
contain the polymorphic marker x and a single base mismatch in Fig. 7. Thus, for the example in Fig. 8, the mismatch 
average for A would be the sum of cells denoted by 'n* and dividing the sum by 14. 

A perfect match average and mismatch average tor polymorphic marker y Is determined in a similar manner utilizing 
the cells denoted by 'D' and 'd", respectively Therefore, the perfect nwtch averages are an average Intensity of cells 
containing probes that are pofteclly compiementaiy to an allele. The mismatch averages are an average of intensity 
2S of cells containing probes that have a single base mismatch near the polyrrorphic marker in an allele. 

At step 204, the system calculates a Ratio of the perfect match and mismatch averages for x to the perlect match 
and mismatch averages for y. The numerator of the Ratio includes the mismatch average for x subtracted from the 
perfect misn^tch for x. In a preferred embodiment, if the resulting numerator is less than 0, the numerator Is set equal 
to 0. 

30 The denominator of the Ratio includes the mismatch average for y subtracted from the perfect mismatch for y In 

a preferred embodiment, if the resulting denominator is less than or equal to 0, the denominator is set equal to a 
minimum value, i.e.. 0.00001. 

Once the system has calculated the Ratio, the system cateulates DB at step 206. DB is calculated by the equation 
DB = lO'logtoRatio- The logarithmic function puts the ratio on a linear scale and makes it easier to interpret the results 
55 of the comparison of intensities. 

At step 208, the system performs a statistical check on the data or hybridization intensities. The statistical check 
is performed to determine if the data will likely produce good results. In a preferred embodiment, the statistical check 
involves testing whether tho maximum of the perlect match averages for x or y is at least twice as great as the standard 
deviation of the intensities of all the cells containing a single base misnrwlch (i.e.. denoted by a 'n' or "d" in Fig. 8). If 
the perfect match average is at least two times greater than this standard deviation, the data is likely to produce good 
results and this is communicaled to the user. 

The system analyzes OB at step 210 to determine if OB is approaching near 0. or approaching -m<». In practice, 
the OB will typically ot go beyond 50 or -SO. If DB is approaching a negative infinity (e.g.. -50). the system detenr^ines 
that the sample ONA contains a homozygoto with a first polymorphic nrtarker corresponding to x at step 212. If OB is 
<5 near 0, the system detemitnes that the sample ONA contains a heterozygote con-esponding to both polymorphic mark- 
ers X and y at step 214. Although described as approaching etc., as described previously, these numbers will gen- 
erally vary, but are nonetheless indicative of the calls described. If OB is approaching a positive infinity (e.g.. +50). the 
system determines that the sample ONA contains a homozygoto with a second polymorphic marker corresponding to 
y at step 216. 

50 A Visual inspection of the Ratio equation in step 204 shows that the numerator should be higher than the denom- 
inator if the ONA sample only has the polymorphic marker corresponding to x. Similarly, the denominator should be 
higher than the numerator if the ONA sample only has a polymorphic marker corresponding to y If the ONA sample 
has both polymorphic markers, indicating a heterozygote. the Ratio should be approximately equal to 1 which results 
in a 0 when the logarithm of the Ratio is calculated. 

55 The equations discussed above illustrate just one embodiment of the present invention. These oquatKDns have 
correctly identified polymorphic markers when a visual inspection would seem to indicate a different result. This may 
bo the case because the equations lake into account the mismatch intensities in order to determine the presence or 
absence of the polymorphic markers. Additional methods may also bo employed to properly compare the hybridization 
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intensities, including, e.g., principal component analysis and the like. 

Those of skill in the art, upon reading the instant disclosure will appreciate that a variety of modifications to the 
above described methods and arrays may be made without deparing from the spirit or scope of the invention. For 
example, one may select the strand of (he target sequence to optimize the ability to call a particular genome. Atterna- 
s tively, one may analyze both strands, in parallel, to provide greater amounts of data from which a call can be made. 
Additionally, the analyses, i.e.. amplification and scanning may be performed using ON A. RNA. mixed polymers, and 
the like. 

The present invention is further illustrated by the following examples. These examples are merely to illustrate 
aspects of the present invention and are not intended as limitatksns of this invention. 

10 

V. Examples 
Example 1* Chip Tiling 

IS A ON A chip is prepared which contains three detection blocks for each of 78 Identified single base polymorphisms 

or biallolic markers, in a segment of human ONA (the 'target* nucleic acid). Each detection block contains probes 
wherein the Identified polymorphism occurs at the position In the target nucleic acid complementary to the 7th, 10th 
and 1 3ih positions from the 3* end of 20*mer oligonucleotide probes. A schematic representation of a single oligonu- 
cleotide array containing all 76 detectksn blocks is shown In Figure 2A. 
20 The tiling strategy for each block substitutes bases in the positions at. and up to two bases. In either direction from 

the polymorphism. In addition to the substituted positions, the oligonucleotides are synthesized in pairs differing at the 
biallelic base. Thus, the layout of the detection block (containing 40 different oligonucleotide probes) allows for con- 
trolled comparison o( the sequences involved, as well as simple readout without need for complicated instrumentation. 
A schematic illustratk>n of this tiling strategy within a single detectton block Is shown in Figure 3. for a specific poly- 
ps morphic marker denoted Wi-567, 

Example 2- Oetection of Polynrorphisms 

A target nucleic acid is generated from PGR products amplified by primers flanking the markers. These amplk:ons 

30 can be produced singly or in multiplexed reactions. Target can be produced as ss-ONA by asymmetric PGR from one 
primer flanking the polymorphism, as ds-ONA, or as RNA transcribed in vitro from promoters linked to the primers. 
Fluorescent or biotin label is introduced into target directly as dye or biotin-bearing nucleotides. Biotin labelled target 
is then bound after amplification using dye*streptavidin complexes to incorporated biotin containing nucleotides. In 
ONA produced by symetric or asymetric PGR fluorescent dye is linked directly to the 5' end of the primer. 

35 Hybridization of target to the arrays tiled in Example 1 , and subsequent washing are carried out with standard 

solutions of salt (SSPE, TMAC1) and nonionic detergent (Triton-XlOO), with or without added organic solvent (forma- 
mide). Targets and markers generating strong signals are washed under stringent hybrkiization conditions (37-40*C; 
10% formamide; 0.25xSSPE washes) to give highly discriminating detection of the genotype. Markers giving lower 
hybridization intensity are washed under less stringent conditions (i30*C; 3M TMAG1 . or 6xSSPE; 6x and 1x SSPE 

40 washes) to yiekj highly discriminating detection of the genotype. 

Oetection of one polymorphic marker is illustrated in Figure 3. Specifically, a typical detection block is shown for 
the polymorphism denoted WI-1959. having the sequence 5'-AGCAAAAATGAGTGIT/ClGGGTAACTGAGAGTG-3' 
with the polymorphism indk»ted by the brackets (Figure 3, lop), for which all three genotypes are available (T/C het- 
erozygote, G/G homozygote and TfX homozygote). The expected hybridizatkjn pattern for the homozygote and heter- 
ozygote targets are shown in Figure 3. bottom. Three chips were tiled with each chip including the illustrated detection 
block. Each bkxk contained probes having the substituted bases at the 7th, 10th and I3th positions from the 3* end 
of 20-mer oligonucleotide probes (20/7, 20/10 and 20/13. respectively). These alternate detection blocks were tiled to 
provide a variety of sequences flanking the polymorphism itself, to ensure at least one detection block hybridizing with 
a sufficiently low background intensity for adequate detection. 

so Fluorouracil containing RNA was synthesized from a T7 promoter on the upstream primer, hybridized to the de- 

tection array in SxSSPE + Triton-XlOO at 30'C. and washed in 0.2SxSSPE at room temperature. As shown in the scan 
Figure 3, middle, fluorescent scans of the arrays correctly identified the 5 homozygote or 10 heterozygote features. 

While the foregoing invention has been described in some detail tor purposes of clarity and understanding, it will 
be clear to one skilled in the art from a reading of this disclosure that various changes in form and detail can be made 
55 without departing from the true scope of the invention. All publtaatlons and patent documents cited in this application 
are incorporated by reference in their entirety for alt purposes to the same extent as it each individual publication or 
patent document were so individually denoted. 
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APPENDIX A 

SOFTWARE API'ENDIX 
5 



• :4k«fl Input trom « 7CI.V=hip CtL Him ;11S x 130) 4nd • 

* axcracca racio in£orMClon toe mvsy block or. eh<i chip i 



IS 



25 



a£c:Nt 
racpAicucoif • 1.2 
pacC09cI« ■ *y«t* 
baaolOl***:* 
bas«(l) ••c* 
baia(2I««C' 

najRa(0.O] ■ 'Wl-SfiV 
ht.x(0.0) • "CTACCC 
nam* (1,01 ■ 'Wr-Si?' 
haxd.Ol • •TCAGAC' 
namalJ.OJ a 'Wl-597' 
h«x(2,0J • 'TCGATA' 
naa«(l,0| ■ 'Wl-iai' 
hftxfJ.Ol - -AACrXA* 

haxt4,01 ■ •crrsAG* 
tvinolS.Ol • 'WI-SO:- 

h«x(S.oi ■ -cATcrr* 

RA.T.ot«.0: - ■WI-1099* 
h«xl«.01 • -CACATA* 

30 n«i««t7,01 • •w:-ii4T 

hax(?.0] « -ACCACC* 
nam* (8.01 • *w:-l]2S- 
hax(8,01 • •CTCTAC* 
nana;9.01 ■ 'Wl-u:?' 
h«x(9<01 - •CTCTfT" 

35 nam«(0,I) ■ •WI.179S' 

hax[0,ll • 'AAACTS' 
n«a«(I.l.l m •WZ-1S23* 
h«x(l.l) • -CTCTrC" 
nan«(2.ll « "WI-;a79' 
h«x(2,11 ■ •TACrCT' 
nomoO.ll ■ «WX-1888* 
h«xl3.11 • -ATCACA* 
n4iTMil4.1I - •Vfl-19ia* 

h«xt4.ii « -rrcTn" 
&am«(9.I] • •Wl-1959" 
h«x(3,II • •TCTCCC' 

4B aaM[6.1J « *WZ-1741* 

b«x[«.ll • 'CAAOSC* 
ri*maC7.lJ • •Wl-I7fi0' 
h«x(7,il • 'ACCACA* 
namc(8.11 « 'WX-1799- 
h«x(8,ll • -TCGATA* 

SO naat«(9.11 ■ •WI-1973' 

haxt9.11 > -CAAGAC* 
njtA«(0.21 > 'WX-igaO' 
h«x[0\ Jl ■ 'AACTGA' 
na.-na(1.21 * *w:-201S- 
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10 



IS 



25 



hex [I.:] - 'GACTCT- 
r.air.e(2.21 - -wi-JSeA' 
nox[2.21 > *CCA3A.C* 
r.am«{3.2l » •wi-40i:' 
h«xt3.21 - -CTAGTC- 
r.tt.-n«t4.2] - •WI-7 56'?' 
hoxtl.2] • 'TACTCA- 
r.an«(3.21 - 'w:-:i:?3- 
h«x(3.21 ■ -TACACC- 
r.anio(6.21 • *ai4.L6' 
h«x(6.2! • 'CATAAT- 
nAn>o;7.21 • 'WI-STCl' 
t;tx(1.21 ■ "ACTCCA" 
r.Ain«tQ-21 • "Wl-filJl* 
hoxfa,21 - 'CCCACA' 
r.am«t9.2] • "WI-STa?' 
h«xt9.21 • •ACAcrr- 
r.am«t0.3I ■ 'WI-SSIO* 
h«xt0.31 • •TACTTC- 

20 h«xll.3] • •TTGArr* 

n4in«t2,3I ■ 'XDH3' 
hex (2. 3] - •ATASTT- 
njjn«[3.3] ■ 'ACT' 
h«x[3.3) • 'GACTGC- 

h«xt*,31 • •rrCTCC- 
r.«jn«f5.31 • -^1.003-2* 
hex t 5. 3] • 'CCAGAT- 
r.djn«(6.31 ■ 'APCD* 
hexl6.31 • -ACTCCT- 
30 r.*me[T.3] • 'APCE C l32T/':i ■ 

h«xl7,31 ■ 'TCTCGC 
ti«meJ8<3J • 'APCECaaOT/c; ' 
h«xl8.3) ■ 'ACTCCC- 
name (9.31 • 'AJISB' 
hext9.3J • -TCCATC- 
^5 r.4in«(0,4l ■ 'ATlA* 

h«x(o.4i « •crrccc" 

nAmetl,4j - 'ATlb* 
hex I L. 41 - •CCACTT- 
n*jn«t2.4| ■ •BCL2* 
hex (2. 41 ■ -ACCACG- 
nAin«{3,4l - 'BRCAla* 
hext3.4l n 'CATCTC- 
n«m4i(4.4l ■ 'BRCAlb* 
hexl4,4I • 'ACACAC- 
MJnat5,4I - 'BRCAIC 
45 hexl5.4] ■ -CAAGAC 

nam«t6.4l « 'OJsa' 
hexl6,41 ■ 'CCAGCT- 
r.*ingt7,4J ■ '03S11* 
hex 17, 4] « 'TCTGRR- 
nain«[8.4l - •03312' 
hexl0.41 - •CCACCC 
r.4ffe(9.4l ■ 'ORCJ* 
hex{9,4l « -CACTCG- 
r.AJn«:o,51 - •rAflP2- 
hex(0,51 ■ 'CCGACT- 
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10 



IS 



2S 



SO 



h«x(1.5I ■ -GACACA- 
naini![2.51 a 'WTZ- 
h«xl2,3I ■ -CTCTGC' 
nam* [3. 51 - *HT4' 
h«x(3.51 - -TCCAAT* 
nAAto { 4 , 5 1 - • ' KTS ' 
h«xt4.5) ■ -ACTCCA* 
na.T>«(5.51 • TCn* 
h«x(9.S) • 'GGCACC* 
nA.-n«(6.S1 - 'ICKV4-S* 
h«x(«,5I - -TCTCCA* 

h«x{7.51 ■ "TCTACC* 

h«xie,5I ■ -GCCTAA* 
n«n«[9,51 • *LF7 5' 
h«xt9.51 - 'CCASGC 
nam«(0.6] - 'LPL* 
h«xtO.SJ - 'ACCTAC 
naattCl.6) - 'MCC* 
h«xtI.Sl ■ 'OCCTGA* 
n4««(3,«l « -HETH* 
h«x{2.6J • •CCCTCG' 
nani4(3.6I - 'NRAMP' 
h*xt3.6) • 'CAGATC* 
nam«(4,61 • *PAJl' 
h«x(4.6] ■ •ACATTG* 

hox{S,£1 « *GAAGCA* 
fi] • *PP?3R1* 
50 h«x(6,«l ■ •CACTAA- 

n*ni«(7.61 ■ *R05* 
hax(7.$I ■ *AGCACS* 
naai«t8.$1 • '914544' 
hex(8,fi! - "TCTGCT* 
n«in«[9,fi] « 'S1S3A* 
35 h«x(9.61 « 'CCCATG' 

nain«;0.71 ■ 'TcR-CAl' 
h«xt0,7| « •TCCCGT' 
named, 71 • 'TcR-CBaa' 
h«xtl,71 « •CCCTCC' 
nam«(2.71 • •TcR-CB23' 
h«xr2.71 ■ "CTCTAC' 
nam«(3.7] - •TcR-Ca24' 
h«x(3.71 ■ -CTCATC* 
nan* {4. 7] • 'TcR-CB23' 
h«x(4.71 • •CTACCC" 
4S nam«tS.71 - 'TcR-CSa?' 

hext5.7] - -ACCTrA' 
nAm«[«.7] • -VB12«* 
h«x(6.7] - -ACACTG- 
nAin«(7,7I ■ 'VB12b- 
h«xt7.71 - 'CACTCA* 
bkgsun « 0 
bkgnuxn « 0 

J 
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IS 



2S 



( 

readthis « 1 

if tSl - /IA-Za-=1/ 1 I 52 - /(A-na-zl/) r»f«dt^a« « 0 
5 iC (r«adthl3 -- 1) rftwdaCaUI, $3] - S3 

if {%1>1 LS. %2>i) it tSl<:i3 t& S3<i:t) it (SI<90 II $2<109) 

( 

px - inc( (Sl-3)/n) 
py - lat((S3-5)/15i 
pxo • (ll*px).) 
10 pyo • (I5*py)*5 

mx ■ Sl-pxo 
by ■ $2 -pyo 

block - 3 Mine (by/5) 1*7 
if {by*5 !■ 4 mx 10) 

( 

lb • baf«(by%5l 
•i0(px,py.bloc)c, sb.BJc} • S3 

) 

if (by%5 4 II ax «• 10) 

C 

20 bIccBun *■ S3 

b)cgxiun^« 

1 

) 

) 

princf Cbackcrour.d • *5.2f\n-, b)co»um/bJc(p:ur«) 
princf -MARXERVtflSTBLK\tJUT:0\;\CCBMCHSCK\t\=P*.raATVn* 
for tpy0;py<8jpy**) ior lpx.C;px<LO;px*.) If tpy < 1 1 PX « Bl 
( 

m:oi - lubscrth^xlpx.py) .1.1) 
mtH • lubatrlhextpx.pyl .1.1^ 
30 m(21 « iubicr(h»xipx.py].2.1) 

n(31 - •uiS3Cr{hftxipx.pyl .2.1) 
mt41 • «ubitrth«xlpx,py: . 3.2) 
m[i] m «ub«tr(h«xtpx.pyi .3.2) 
mt61 - suhscrthexipx.pyl .5.1) 
nt71 • iubicr:haxt?x,py).5.l) 
n(91 - 3ub3tr:haxlpx.py].6.l) 
n\ri ■ sub»tr(h«lpx.?yl .6.1) 

cantor - nubscrthexlpx.py; . 3. 1) Vsutscr (hexlpx.pyl . 4. 1) 
pentwrer • ir,[0) ' -^1(2) M -center* j •in;61 * -m:81 
header • • I -px*!* . 'pyl' ! ' namelpx.pyl 'Vn* pontemor "vrv- 
40 headprint ■ 0 

( 

for {j-0»l<-2;V*5 
t 

block • 13'3)*7 
nuin2 • 0 

4S den2 • C 

nurd ■ 0 
danl ^ 
x2 « 0 
r.L * Q 
n2 • 0 

for (f-0;f<5;f 

( 

:naxhitpx,py.blocK, fl - 0 

for lg-0:g<*;O'-> maxlotpx.py.blocy.,a. tl 

} 
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IS 



Cor {kaO;»c<«9;k«..) foe :b»0 ;b<-J ;b*-: 
{ 

J = ■ lnt(k/2) 

signal ■ •i9(;}c.py, block. b«s«tb! .k) 
omit 0 

U ((n(k) . baa«(b)) cmie « 1 
Lt (omit 1) 

( 

10 q • nuochi (px.py. block, zl 

if (aignAl > q) tu.xhi(px. py, tlock. : | afllgmAl 

} 

it (orait CI 
{ 

q ■ mAxlotpx.py.block.b. = 1 

If (tigr.Al, > q) ;i4,xlo (px. py, block. b. =) •ai^Al 
if ikM — 01 

( 

) 

i; — 11 

{ 

d«r.2 ilTnal 

) 

) 

1£ (omit >- I) if (k*«4 II k>«5) 

{ 

if (b4»«(b! •■ Jxibsir (ha.x(px.ey] ,3.1) ) 
( 

nxiifC signal 

) 

If (bas«[b) subacr(h«x(px.py) .4. 1) ) 

( 

d«nl signal 

) 
) 

} 

mAxhisum ■ 0 
for (f«0; f<3; f**| 
( 

maxhlsust maxbi (px.py.bXocJc. f I 

> 

maxhiav • maxhiatim/S 
BMxloaum ■ 0 

for (a*0;g<5;g**) for (v«0;v<4;v**l 
C 

maxlevum njLxle(px.py.bIock. v.gt 

) 

maxloav - fnaxlo9\ini/14 
maxrat ■ m&xh lav /maxloav 
num. ( (nuni/a)-(num2/nl) ) 
if (num < 0) num « 0 
den • t (danl/2)-{dan2/n3) ) 
if (dan <• 0} den • 0.001 
ratio « n\oi/d«n 
max ■ ntunl/2 
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10 



if (cieni/2 > maxJ awix s <;«nl/2 
n ■ nl*n2 

scdvxnuro - ( (a*x2) - {nuin2*den2» *:} 
it (scdvxnum < 0) scdvx • 0 
acdvx ■ (scdvxr.x-n/ (a'2) 1 • (0. 5; 
If (wflLicrat > racpACcucoff 1' pacco^^l* 'no') 
( 

if (h«ddprir.c «• 9; 
( 

princf h««d«r 
h««dprint • 1 

) 

princ! ncZO/'block'Sc" 
'5 prtnci c^l.JfNc'. raclo) 

tf (ratio < iOOOO) prlr.cf '\c" 
rac « racio 

if (racio >• 0) rac • .00001 
lograc • lo^lrac) /lo^dO) 
princf (•%:.:f\c". lO»IogracJ 
print* Cir.Jf, max/acdvxj 
if (max/acdvx < 2) princf •\cFAII.\f 
if (max/«tdvx >■ 2) printf "\CVC" 
princf (•»2.2f, maxrac) 

if (majcra; > racpaccucof f ) princf 'NcGOCOPA?" 
2S princf '\n* 

) 
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Claim* 

1. An array of oligonucleotide probes for detecting a potymorphism in a target nucleic acid sequence, said array 
comprising at least one detection block of probes said detection block including first and second groups of probes 

s that are complementary to said target nucleic acid sequence having first and second variants of said polymorphism, 

respectively, and further comprising third and fourth groups of probes, said third and fourth groups of probes having 
a sequence identical to said first and second groups of probes, respectively, except that said third and fourth groups 
ot probes include all possible monosubstitutions of positions in said sequence that are within n bases of a base in 
said sequence that is complementary to said polymorphism, wherein n is from 0 to 5. 

10 

2. The array of claim i , wherein n is 2. 

3. The array of claim 1 , wherein said first and second groups of probes comprise a plurality of different probes that 
are complementary to overlapping portions of said target nucleic acid sequence. 

IS 

4. The array ot claim 1 . wherein said detection block further comprises said third and fourth groups of probes wherein 
said monosubstitutbns occur at a plurality of distances from a 3' end of said probes. 

5. The array of claim 1 . wherein said detection block includes between about 6 and 66 different probes. 

6. The array of claim 1 . comprising between 1 and 1 .000 different detection blocks, each of said detection blocks 
including probes complementary to first and second variants of a different potymorphism in said target nucleic acid 
sequence. 

2B 7. The array of claim 1 . wherein each of said detectk)n blocks further comprises fifth and sixth groups of probes, said 
fifth and sixth groups of probes being complementary to said first and second groups of probes, respectively 

8. A method of identifying whether a target nuclete acki sequence includes a polymorphic variant, comprising: 

30 hybridizing said target nucleic acid sequence to an array of oligonucleotide probes, said array comprising at 

least one detectk^ block of probes said detection block Including first and second groups of probes that are 
complementary to said target nucleic acid sequence having first and second variants of said polymorphism, 
respectively, and further comprising third and fourth groups of probes, said third and fourth groups of probes 
having a sequence identk:al to said first and second groups of probes, respectively, except that said third and 

3S fourth groups of probes include alt possible monosubstitutions of positions in said sequence that are within n 

bases of a base in said sequence that is complementary to said polymorphism, wherein n is from 0 to 5; and 
determining which of said first and second groups of probes hybridizes with said target nucleic acid sequence 
to identify said polymorphic variant. 

9. The method of claim 0. wherein said target nucleic acid comprises a detectable label. 

10. The method of claim 9. wherein said detectable label is a fluorescent group. 

11. The method of claim 9. wherein said label is a binding group. 

4$ 

1 2. The method of claim 1 1 . wherein said binding group is selected from biotin, avkJin and streptavidin. 

13. The method of claim 8, wherein said detection bkx:k Includes fifth and sixth groups of probes, said fifth and sixth 
groups of probes being complementary to first and second variants of an aniisense strand of said target sequence. 

£0 

14. The method of claim 8. wherein said step of determining comprises: 

calculating a ratio of hybridization intensity of said target nuclete ackj to said first group of probes versus 
hybridization intensity of said target nucleic acid to said second group of probes; 
ss and identifying a homozygote for said first variant when said ratio is greater than 2. a homozygote tor said 

second variant when said ratio is less than 0.5. and a heterozygote when sakJ ratio is between about 0.7 and 

1.5. 
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0 



SEQUENCE AT POLYMORPHISM WI-567 



^ A 3' 

TGCTGCCTTGGTTC [ ^ ] AGCCCTCATCTCTTT 



THREE BLOCKS OF COMPLEMENTARY 
20-MER OLIGOS WITH SUBSTITUTIONS (N) 
7. 10 AND 13 BP FROM THE 3* END 



P-2 P-1 P P-H P+2 



AG AGAG AG AG 



BASES IN THE SHADED COLUMNS 

3 • BLOCK 20/7 5 • 
AACCAAN [ C ] TCGGGAGTAGAG 



3' BLOCK 20/10 5' 
CGGAACC AAN [ C ] TCGGGAGTA 



m 



3' BLOCK 20/13 5' 
CGACGGAACCAAN [C ] TCGGGA 



FIG. 2B 
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SEQUENCE AT POLYMORPHISM WI-567 



5' 



TGCTGCCTTGGTTC [^l AGCCCTCATCTCTTT 



SYNTHESIZE BLOCK OF COMPLEMENTARY 
20-MER OLIGOS WITH SUBSTITUTIONS (N) 
10 BP FROM THE 3' END 



3* 5- 
-ACGGAACCANG [T 1 TCGGGAGT 
-ACGGAACCANG [C ) TCGGGAGT 

- CGGAACCAANITITCGGGAGTA 

- CGGAACCAAN [C ] TCGGGAGTA 

GGAACCAAG (N) TCGGGAGT AG 

GGAACCAAG [N ] TCGGGAGT AG 

GAACCAAG [T ] NCGGGAGTAGA 

GAACCAAG (C ] NCGGGAGTAGA 

AACCAAG (T ] TKGCGAGTAGAG 

AACCAAG (C 1 TNGGGAGTAGAG 



ir 



P-2P-1 P P*1P*2 

AGAGAGAGAG = POLYMORPHISM 



PREDICTED PATTERNS 
AGAGAGAGAG AGAG AG AG AG A G AGAG AG AG 



GENOTYPE: 




FIG. 3 
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P-2 P-l ?0 P+1 P+2 
X YXYX YXYX Y 



F/G. 7 



T C AG 
AGAGAG AG AG 




FIG. 8 



( START ) 



RECEIVE CELL 
INTENSITIES 



CALCULATE RATIO = 
PM(X) - MM(x) 
PM(y)-MM(y) 



CALCULATE DB = 
10 -logio RATIO 



PERFORM STATISTICAL 
CHECK ON DATA 



212* 



HOMOZYGOTE 
WITH FIRST 
MARKER 




210 



"ANALYZE" 
)B'L. 



-202 



-204 



-206 



-208 



NEARO 



HETEROZYGOTE 



HOMOZYGOTE 
WITH SECOND 
MARKER 



-214 



-216 



( END ) 

FIG. 9 



22 



EP 0 765 260 A2 



CM 


CM 
CO 
CM 




1 


2S32 


in 


728 IIA 


7388 


CO 

E 
to 


7795ml 


CM 

E 

(O 


8171b 


AGT 


ATlb 


C823 


t— 

U} 

s 


GNAT2 


ILIA 


LMP2 


PXMPI 


i 




r> 

CM 
CM 


CM 

■V 
CM 


CM 


oo 




E 

i 


oi 




1 




6 

i 






m 

< 




s 


£ 




a. 


X 


z 


S 
s 


s 

CM 


» 
CM 




i 








eo 


E 

Ol 

«o 


s 


E 
5 


i 

oo 

00 


3 




S 


CM 

o 


s 

o 


iC 
12 




ir 

i2 

1 u 

a. 


CM 

i 


8 


s 




o 






o 

lo- 


i 


CM 

r> 

r*. 


1 

n 


fc 

tn 

<0 


1 


CO 


CM 

E 

i 


CM 

1 


3 


i 

o 


a 
o 


CM 

o 


5 
9 


<o 
a 

*E 
«• 
a. 


E 

CM 

i 








a» 


r? 


o 

00 

o» 


co 

s 

CD 


o 

hi. 


1 

« 






« 

IS. 


E 

§ 


1 

i 


UJ 

s 

< 


8 


CM 

E 

CO 

o 


S! 


X 


i 

i 


1 


s: 

iC 

E 


1 


z 




00 




a 


S 
CO 


to 

CM 




tn 


i 

CO 


1 


§ 


CO 

m 

s 


1 


ce 
5 


S 


e 


i 


CM 

o 

i 


i 




1 M 

s 


B 






1 


i 


i 


r«-. 








e 

CO 


s 


i 




CD 


CO 
CO 

O 


& 

ID 

S 


CM 


1 
1 


1 


eo 

IS 


o 

i 


B 




lO 


ro 


CO 

O 


&; 

in 


m 


Jo 

CO 

r- 


h» 

CO 

tn 


ro 


CO 
(O 
CO 


o 
«o 


CM 
OO 
fO 
CO 


S: 


o 

i 


<o 
o 

CM 

o. 
>- 
o 


s 

ec 
a 


m 
CO 
O 

CO 
X 




a. 


e 

CO 


fO 

i 


n 


& 


tn 


«n 

CM 

ro 


CM 

o> 




CM 

m 


CM 
CO 


^ 
tn 
r*. 


i 

CO 

p* 


r*. 

CO 

r* 


Si 

Ol 


» 

8 


i 
S 


i 

c_> 
oo 


9 


CM 
O 

oc 
o 


1 
s 


s 

E 


S 
z 


£ 

GC 
Z 

CO 


52 
o 

CO 

i 


s 






s 

CO 


OO 

«o 






CO 


CO 

tn 


s 


CM 

E 

i 


CM 

o> 


CO 




i 

CO 


to 

i 


o 
ec 
o 


1 
s 


CM 
CD 


u. 

z 




(M 
CO 

i 


cn 
•V 




CO 




§ 


CO 

o 




in 
E 




o 

iS 


1 


1 


i 


s 
i 


s 

00 


CM 
03 
U 


o 


s 
i 


1 


X 

s 


oo 

CO 


o> 


r> 

CM 


CM 


CM 




i 


fS 

CM 


?Z 


e 
5 


01 

h» 


s 

(O 






s 


s 

9 




tn 

CM 
CD 
O 


S 
X 
o 


1 


i 


i 


<A 


N 
fO 

i 


CO 


CM 
CM 




1 


s 


s 

CM 


to 


CM 


tn 
o 


to 
r» 






s 

CM 
CO 


ol 

S2 


1 


CM 

ai 
o 


o 




CO 


a. 


CO 

CI 

ae 


? 
t 










CM 


CO 




tn 


CO 




00 




o 




CM 


CO 




tn 


to 




00 




Fromrl 








O 
CO 


CO 




s 


oo 


8 


s 








8 


R 


s 


s 


CM 






» 






E 
o 


tn 




CO 






o 


CO 
(S 


<o 
cn 


s 


CM 
CM 


CO 


OO 


CO 




oo 


CM 


CO 


CO 

a 


CO 
CM 



23 



EP 0 765 280 A2 



I 

=11 

0 <5 
^ " 

01 tz ^ 
T3 O O 
C3 

C 

o 



■ m 



•••.! 



§ 



CD 
EC 



syVj« riJi''^^ ■■.•■•-,.-'■ '•.... •1^... -2 



•fr;-HDr-..ia'-.'. 



■• «?■* :*« .;:i„, 



24 



